Skip to content

Expose internal scheduling metadata#2873

Draft
ValentaTomas wants to merge 17 commits into
mainfrom
valenta/snapshot-scheduling-metadata
Draft

Expose internal scheduling metadata#2873
ValentaTomas wants to merge 17 commits into
mainfrom
valenta/snapshot-scheduling-metadata

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented May 30, 2026

Expose compact scheduling metadata from orchestrator and template-manager without DB or public API changes.

Return compact build-chain metadata from snapshot creation so API placement can later use real build ancestry for resume affinity.
@cla-bot cla-bot Bot added the cla-signed label May 30, 2026
@cursor
Copy link
Copy Markdown

cursor Bot commented May 30, 2026

PR Summary

Medium Risk
Backward-compatible gRPC additions, but Pause’s return type changes and metadata can be nil when headers or devices are not ready, so callers must handle optional fields.

Overview
This PR threads scheduling metadata through orchestrator and template-manager gRPC so schedulers can reason about build chains and local cache footprint without new DB or public API fields.

A shared SchedulingMetadata message (base build, generation, memfile/rootfs sizes, and parallel arrays for up to 32 contributing builds with logical bytes and mapping counts) is derived from memfile/rootfs diff headers via scheduling.FromHeaders. Snapshots compute it when pause headers resolve (with warnings if memfile dedup headers are still pending); cached templates expose it after devices are fetched.

Sandbox create, pause, and checkpoint responses now include this metadata; pause returns a typed response instead of empty. List cached builds merges template TTL/scheduling info with per-build memfile/rootfs cached vs total bytes from the diff store, using true logical size where chunkers only partially fetched data. Completed template builds attach the same shape on TemplateBuildMetadata for persistence downstream.

Reviewed by Cursor Bugbot for commit d48f099. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread packages/orchestrator/pkg/server/sandboxes.go
Comment thread packages/db/queries/snapshots/update_snapshot_scheduling_metadata.sql Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

In update_snapshot_scheduling_metadata.sql, using to_jsonb on a text parameter stores it as an escaped string scalar rather than a JSONB object, so the parameter should be cast directly to jsonb. In sandboxes.go, accessing res.SchedulingMetadata will cause a compilation error because the snapshotResult struct lacks this field. Additionally, NewSnapshotSchedulingMetadata in snapshot.go is missing nil checks for the Mapping fields of the headers, which can lead to a nil pointer dereference panic.

Comment thread packages/db/queries/snapshots/update_snapshot_scheduling_metadata.sql Outdated
Comment thread packages/orchestrator/pkg/server/sandboxes.go
Comment thread packages/orchestrator/pkg/server/sandboxes.go
Comment thread packages/orchestrator/pkg/sandbox/snapshot.go Outdated
Populate pause/checkpoint responses from snapshot results and store scheduling metadata as JSONB.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Pause fails after orchestrator pause
    • Added failSnapshotBuild call when updateSnapshotSchedulingMetadata fails after successful pause RPC to properly mark build status as failed.
  • ✅ Fixed: Metadata error finishes snapshot success
    • Added finish(err) call before returning when updateSnapshotSchedulingMetadata fails to prevent deferred finish(nil) from incorrectly marking snapshot as success.

Create PR

Or push these changes by commenting:

@cursor push caa2284235
Preview (caa2284235)
diff --git a/packages/api/internal/orchestrator/pause_instance.go b/packages/api/internal/orchestrator/pause_instance.go
--- a/packages/api/internal/orchestrator/pause_instance.go
+++ b/packages/api/internal/orchestrator/pause_instance.go
@@ -52,6 +52,8 @@
 	}
 	if schedulingMetadata != nil {
 		if err := o.updateSnapshotSchedulingMetadata(ctx, sbx.SandboxID, schedulingMetadata); err != nil {
+			o.failSnapshotBuild(ctx, result.BuildID, err)
+
 			return err
 		}
 	}

diff --git a/packages/api/internal/orchestrator/snapshot_template.go b/packages/api/internal/orchestrator/snapshot_template.go
--- a/packages/api/internal/orchestrator/snapshot_template.go
+++ b/packages/api/internal/orchestrator/snapshot_template.go
@@ -106,6 +106,8 @@
 		if err := o.updateSnapshotSchedulingMetadata(ctx, sbx.SandboxID, metadata); err != nil {
 			o.failSnapshotBuild(ctx, upsertResult.BuildID, err)
 
+			finish(err)
+
 			return SnapshotTemplateResult{}, err
 		}
 	}

You can send follow-ups to the cloud agent here.

Comment thread packages/api/internal/orchestrator/pause_instance.go Outdated
Comment thread packages/api/internal/orchestrator/snapshot_template.go Outdated
Do not fail a successful pause or checkpoint if persisting optional scheduling metadata fails.
@cursor

This comment has been minimized.

Use slices.SortFunc and remove an unnecessary conversion in metadata generation.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Limit one drops parent build
    • Added condition to prevent base build from overwriting parent build when chain limit is 1 and last element is the parent.

Create PR

Or push these changes by commenting:

@cursor push 527fa53cf5
Preview (527fa53cf5)
diff --git a/packages/orchestrator/pkg/sandbox/snapshot.go b/packages/orchestrator/pkg/sandbox/snapshot.go
--- a/packages/orchestrator/pkg/sandbox/snapshot.go
+++ b/packages/orchestrator/pkg/sandbox/snapshot.go
@@ -139,7 +139,7 @@
 					break
 				}
 			}
-			if !hasBase {
+			if !hasBase && chain[len(chain)-1].buildID != parentBuildID {
 				if c, ok := contributions[baseBuildID]; ok {
 					chain[len(chain)-1] = *c
 				} else {

You can send follow-ups to the cloud agent here.

Comment thread packages/orchestrator/pkg/sandbox/snapshot.go Outdated
Avoid replacing the only chain entry with the base build because base build is already stored separately.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Nested metadata breaks JSONBStringMap scans
    • Changed update_snapshot_scheduling_metadata.sql to use to_jsonb(@metadata::text) instead of (@metadata::text)::jsonb, storing the scheduling metadata as a JSON string value that JSONBStringMap can unmarshal correctly.
  • ✅ Fixed: Upsert clears scheduling metadata key
    • Changed create_new_snapshot.sql conflict handler to use snapshots.metadata || excluded.metadata instead of excluded.metadata, preserving existing metadata keys including snapshot_scheduling_metadata during upsert.

Create PR

Or push these changes by commenting:

@cursor push c71543e849
Preview (c71543e849)
diff --git a/packages/db/queries/create_new_snapshot.sql.go b/packages/db/queries/create_new_snapshot.sql.go
--- a/packages/db/queries/create_new_snapshot.sql.go
+++ b/packages/db/queries/create_new_snapshot.sql.go
@@ -55,7 +55,7 @@
             now()
    )
     ON CONFLICT (sandbox_id) DO UPDATE SET
-        metadata = excluded.metadata,
+        metadata = snapshots.metadata || excluded.metadata,
         sandbox_started_at = excluded.sandbox_started_at,
         allow_internet_access = COALESCE(excluded.allow_internet_access, snapshots.allow_internet_access),
         origin_node_id = excluded.origin_node_id,

diff --git a/packages/db/queries/snapshots/create_new_snapshot.sql b/packages/db/queries/snapshots/create_new_snapshot.sql
--- a/packages/db/queries/snapshots/create_new_snapshot.sql
+++ b/packages/db/queries/snapshots/create_new_snapshot.sql
@@ -41,7 +41,7 @@
             now()
    )
     ON CONFLICT (sandbox_id) DO UPDATE SET
-        metadata = excluded.metadata,
+        metadata = snapshots.metadata || excluded.metadata,
         sandbox_started_at = excluded.sandbox_started_at,
         allow_internet_access = COALESCE(excluded.allow_internet_access, snapshots.allow_internet_access),
         origin_node_id = excluded.origin_node_id,

diff --git a/packages/db/queries/snapshots/update_snapshot_scheduling_metadata.sql b/packages/db/queries/snapshots/update_snapshot_scheduling_metadata.sql
--- a/packages/db/queries/snapshots/update_snapshot_scheduling_metadata.sql
+++ b/packages/db/queries/snapshots/update_snapshot_scheduling_metadata.sql
@@ -3,7 +3,7 @@
 SET metadata = jsonb_set(
     metadata,
     '{snapshot_scheduling_metadata}',
-    (@metadata::text)::jsonb,
+    to_jsonb(@metadata::text),
     true
 )
 WHERE sandbox_id = @sandbox_id;

diff --git a/packages/db/queries/update_snapshot_scheduling_metadata.sql.go b/packages/db/queries/update_snapshot_scheduling_metadata.sql.go
--- a/packages/db/queries/update_snapshot_scheduling_metadata.sql.go
+++ b/packages/db/queries/update_snapshot_scheduling_metadata.sql.go
@@ -1,6 +1,6 @@
 // Code generated by sqlc. DO NOT EDIT.
 // versions:
-//   sqlc v1.31.1
+//   sqlc v1.29.0
 // source: update_snapshot_scheduling_metadata.sql
 
 package queries
@@ -14,7 +14,7 @@
 SET metadata = jsonb_set(
     metadata,
     '{snapshot_scheduling_metadata}',
-    ($1::text)::jsonb,
+    to_jsonb($1::text),
     true
 )
 WHERE sandbox_id = $2

You can send follow-ups to the cloud agent here.

Comment thread packages/db/queries/snapshots/update_snapshot_scheduling_metadata.sql Outdated
Comment thread packages/api/internal/orchestrator/pause_instance.go Outdated
Store scheduling metadata as a string value in the existing metadata map and keep it across snapshot upserts until refreshed.
@ValentaTomas ValentaTomas marked this pull request as ready for review May 30, 2026 21:58
Comment thread packages/api/internal/orchestrator/pause_instance.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/snapshot.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/snapshot.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/snapshot.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/snapshot.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/snapshot.go Outdated
@ValentaTomas ValentaTomas changed the title Persist snapshot scheduling metadata Expose internal scheduling metadata May 31, 2026
Comment thread packages/orchestrator/pkg/sandbox/sandbox.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template/storage_template.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 055b4e0d2a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/sandbox.go Outdated
Comment thread packages/orchestrator/pkg/scheduling/metadata.go Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Generation incremented twice
    • Removed the duplicate generation increment from FromHeaders and moved it to only apply in NewSnapshotSchedulingMetadata for pre-snapshot headers.

Create PR

Or push these changes by commenting:

@cursor push 9d1cfd4f72
Preview (9d1cfd4f72)
diff --git a/packages/orchestrator/pkg/sandbox/snapshot.go b/packages/orchestrator/pkg/sandbox/snapshot.go
--- a/packages/orchestrator/pkg/sandbox/snapshot.go
+++ b/packages/orchestrator/pkg/sandbox/snapshot.go
@@ -58,5 +58,9 @@
 }
 
 func NewSnapshotSchedulingMetadata(memfileHeader, rootfsHeader *header.Header) *orchestrator.SchedulingMetadata {
-	return scheduling.FromHeaders(memfileHeader, rootfsHeader)
+	metadata := scheduling.FromHeaders(memfileHeader, rootfsHeader)
+	if metadata != nil {
+		metadata.Generation++
+	}
+	return metadata
 }

diff --git a/packages/orchestrator/pkg/scheduling/metadata.go b/packages/orchestrator/pkg/scheduling/metadata.go
--- a/packages/orchestrator/pkg/scheduling/metadata.go
+++ b/packages/orchestrator/pkg/scheduling/metadata.go
@@ -93,7 +93,7 @@
 
 	res := &orchestrator.SchedulingMetadata{
 		BaseBuildId:          baseBuildID.String(),
-		Generation:           max(memfileHeader.Metadata.Generation, rootfsHeader.Metadata.Generation) + 1,
+		Generation:           max(memfileHeader.Metadata.Generation, rootfsHeader.Metadata.Generation),
 		ChainBuildIds:        make([]string, 0, len(chain)),
 		MemfileLogicalBytes:  make([]uint64, 0, len(chain)),
 		MemfileChunkCounts:   make([]uint32, 0, len(chain)),

You can send follow-ups to the cloud agent here.

Comment thread packages/orchestrator/pkg/scheduling/metadata.go Outdated
@cursor
Copy link
Copy Markdown

cursor Bot commented May 31, 2026

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Pause metadata omits snapshot layer
    • Changed NewSnapshotSchedulingMetadata to compute from diff headers (memfileDiffHeader, rootfsDiffHeader) instead of original parent headers, ensuring the new snapshot buildID and its storage contributions are included in the scheduling metadata.
  • ✅ Fixed: Cached metadata skips pending headers
    • Changed SchedulingMetadata() to use WaitWithContext instead of Result, allowing it to wait for asynchronously-resolving headers rather than returning nil when headers are still pending.

Create PR

Or push these changes by commenting:

@cursor push 526c21400a
Preview (526c21400a)
diff --git a/packages/orchestrator/pkg/sandbox/sandbox.go b/packages/orchestrator/pkg/sandbox/sandbox.go
--- a/packages/orchestrator/pkg/sandbox/sandbox.go
+++ b/packages/orchestrator/pkg/sandbox/sandbox.go
@@ -1185,7 +1185,7 @@
 	}
 	cleanup.AddNoContext(ctx, rootfsDiff.Close)
 
-	schedulingMetadata := NewSnapshotSchedulingMetadata(originalMemfile.Header(), originalRootfs.Header())
+	schedulingMetadata := NewSnapshotSchedulingMetadata(ctx, memfileDiffHeader, NewResolvedDiffHeader(rootfsHeader))
 
 	metadataFileLink := template.NewLocalFileLink(cachePaths.CacheMetadata())
 	cleanup.AddNoContext(ctx, metadataFileLink.Close)

diff --git a/packages/orchestrator/pkg/sandbox/snapshot.go b/packages/orchestrator/pkg/sandbox/snapshot.go
--- a/packages/orchestrator/pkg/sandbox/snapshot.go
+++ b/packages/orchestrator/pkg/sandbox/snapshot.go
@@ -57,6 +57,12 @@
 	return nil
 }
 
-func NewSnapshotSchedulingMetadata(memfileHeader, rootfsHeader *header.Header) *orchestrator.SchedulingMetadata {
+func NewSnapshotSchedulingMetadata(ctx context.Context, memfileDiffHeader, rootfsDiffHeader *DiffHeader) *orchestrator.SchedulingMetadata {
+	memfileHeader, memfileErr := memfileDiffHeader.WaitWithContext(ctx)
+	rootfsHeader, rootfsErr := rootfsDiffHeader.WaitWithContext(ctx)
+	if memfileErr != nil || rootfsErr != nil {
+		return nil
+	}
+
 	return scheduling.FromHeaders(memfileHeader, rootfsHeader)
 }

diff --git a/packages/orchestrator/pkg/sandbox/template/storage_template.go b/packages/orchestrator/pkg/sandbox/template/storage_template.go
--- a/packages/orchestrator/pkg/sandbox/template/storage_template.go
+++ b/packages/orchestrator/pkg/sandbox/template/storage_template.go
@@ -272,9 +272,9 @@
 	return t.paths
 }
 
-func (t *storageTemplate) SchedulingMetadata() *orchestrator.SchedulingMetadata {
-	memfileHeader, memfileErr := t.memfileHeader.Result()
-	rootfsHeader, rootfsErr := t.rootfsHeader.Result()
+func (t *storageTemplate) SchedulingMetadata(ctx context.Context) *orchestrator.SchedulingMetadata {
+	memfileHeader, memfileErr := t.memfileHeader.WaitWithContext(ctx)
+	rootfsHeader, rootfsErr := t.rootfsHeader.WaitWithContext(ctx)
 	if memfileErr != nil || rootfsErr != nil {
 		return nil
 	}

diff --git a/packages/orchestrator/pkg/server/template_cache.go b/packages/orchestrator/pkg/server/template_cache.go
--- a/packages/orchestrator/pkg/server/template_cache.go
+++ b/packages/orchestrator/pkg/server/template_cache.go
@@ -20,9 +20,9 @@
 	for key, item := range s.templateCache.Items() {
 		var metadata *orchestrator.SchedulingMetadata
 		if provider, ok := item.Value().(interface {
-			SchedulingMetadata() *orchestrator.SchedulingMetadata
+			SchedulingMetadata(context.Context) *orchestrator.SchedulingMetadata
 		}); ok {
-			metadata = provider.SchedulingMetadata()
+			metadata = provider.SchedulingMetadata(ctx)
 		}
 		builds = append(builds, &orchestrator.CachedBuildInfo{
 			BuildId:            key,

diff --git a/packages/orchestrator/pkg/template/build/builder.go b/packages/orchestrator/pkg/template/build/builder.go
--- a/packages/orchestrator/pkg/template/build/builder.go
+++ b/packages/orchestrator/pkg/template/build/builder.go
@@ -390,23 +390,23 @@
 		KernelVersion:      bc.Config.KernelVersion,
 		FirecrackerVersion: bc.Config.FirecrackerVersion,
 		RootfsSizeMB:       units.BytesToMB(int64(rootfsSize)),
-		SchedulingMetadata: templateSchedulingMetadata(builder.templateCache, lastLayerResult.Metadata.Template.BuildID),
+		SchedulingMetadata: templateSchedulingMetadata(ctx, builder.templateCache, lastLayerResult.Metadata.Template.BuildID),
 	}, nil
 }
 
-func templateSchedulingMetadata(cache *sbxtemplate.Cache, buildID string) *templatemanager.TemplateSchedulingMetadata {
+func templateSchedulingMetadata(ctx context.Context, cache *sbxtemplate.Cache, buildID string) *templatemanager.TemplateSchedulingMetadata {
 	t, ok := cache.GetCachedTemplate(buildID)
 	if !ok {
 		return nil
 	}
 	provider, ok := t.(interface {
-		SchedulingMetadata() *orchestratorgrpc.SchedulingMetadata
+		SchedulingMetadata(context.Context) *orchestratorgrpc.SchedulingMetadata
 	})
 	if !ok {
 		return nil
 	}
 
-	return toTemplateSchedulingMetadata(provider.SchedulingMetadata())
+	return toTemplateSchedulingMetadata(provider.SchedulingMetadata(ctx))
 }
 
 func toTemplateSchedulingMetadata(m *orchestratorgrpc.SchedulingMetadata) *templatemanager.TemplateSchedulingMetadata {

You can send follow-ups to the cloud agent here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d21605098a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/template/storage_template.go Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Scheduling metadata uses unset headers
    • Updated Fetch() to propagate loaded headers back to memfileHeader and rootfsHeader SetOnce fields when initial headers are nil, ensuring SchedulingMetadata() returns correct values.

Create PR

Or push these changes by commenting:

@cursor push e86ca15145
Preview (e86ca15145)
diff --git a/packages/orchestrator/pkg/sandbox/template/storage_template.go b/packages/orchestrator/pkg/sandbox/template/storage_template.go
--- a/packages/orchestrator/pkg/sandbox/template/storage_template.go
+++ b/packages/orchestrator/pkg/sandbox/template/storage_template.go
@@ -206,6 +206,12 @@
 			return nil
 		}
 
+		if memHdr == nil {
+			if err := t.memfileHeader.SetValue(memfileStorage.Header()); err != nil {
+				return fmt.Errorf("failed to update memfile header: %w", err)
+			}
+		}
+
 		if err := t.memfile.SetValue(memfileStorage); err != nil {
 			return fmt.Errorf("failed to set memfile value: %w", err)
 		}
@@ -243,6 +249,12 @@
 			return nil
 		}
 
+		if rootHdr == nil {
+			if err := t.rootfsHeader.SetValue(rootfsStorage.Header()); err != nil {
+				return fmt.Errorf("failed to update rootfs header: %w", err)
+			}
+		}
+
 		if err := t.rootfs.SetValue(rootfsStorage); err != nil {
 			return fmt.Errorf("failed to set rootfs value: %w", err)
 		}

You can send follow-ups to the cloud agent here.

Comment thread packages/orchestrator/pkg/sandbox/template/storage_template.go Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: ListCachedBuilds omits proto fields
    • Restored expiration_time and scheduling_metadata fields by combining data from both template cache and diff store, ensuring all cached builds are listed with complete information.

Create PR

Or push these changes by commenting:

@cursor push 425e3ebf86
Preview (425e3ebf86)
diff --git a/packages/orchestrator/pkg/server/template_cache.go b/packages/orchestrator/pkg/server/template_cache.go
--- a/packages/orchestrator/pkg/server/template_cache.go
+++ b/packages/orchestrator/pkg/server/template_cache.go
@@ -7,24 +7,47 @@
 
 	"github.com/e2b-dev/infra/packages/shared/pkg/grpc/orchestrator"
 	"google.golang.org/protobuf/types/known/emptypb"
+	"google.golang.org/protobuf/types/known/timestamppb"
 )
 
 func (s *Server) ListCachedBuilds(ctx context.Context, _ *emptypb.Empty) (*orchestrator.SandboxListCachedBuildsResponse, error) {
 	_, childSpan := tracer.Start(ctx, "list-cached-templates")
 	defer childSpan.End()
 
-	var builds []*orchestrator.CachedBuildInfo
+	buildMap := make(map[string]*orchestrator.CachedBuildInfo)
 
+	for key, item := range s.templateCache.Items() {
+		info := &orchestrator.CachedBuildInfo{
+			BuildId:        key,
+			ExpirationTime: timestamppb.New(item.ExpiresAt()),
+		}
+		if provider, ok := item.Value().(interface {
+			SchedulingMetadata() *orchestrator.SchedulingMetadata
+		}); ok {
+			info.SchedulingMetadata = provider.SchedulingMetadata()
+		}
+		buildMap[key] = info
+	}
+
 	for _, stats := range s.templateCache.CachedBuildStats(ctx) {
-		builds = append(builds, &orchestrator.CachedBuildInfo{
-			BuildId:             stats.BuildID,
-			MemfileCachedBytes:  stats.MemfileCachedBytes,
-			MemfileCachedChunks: stats.MemfileCachedChunks,
-			RootfsCachedBytes:   stats.RootfsCachedBytes,
-			RootfsCachedChunks:  stats.RootfsCachedChunks,
-		})
+		info := buildMap[stats.BuildID]
+		if info == nil {
+			info = &orchestrator.CachedBuildInfo{
+				BuildId: stats.BuildID,
+			}
+			buildMap[stats.BuildID] = info
+		}
+		info.MemfileCachedBytes = stats.MemfileCachedBytes
+		info.MemfileCachedChunks = stats.MemfileCachedChunks
+		info.RootfsCachedBytes = stats.RootfsCachedBytes
+		info.RootfsCachedChunks = stats.RootfsCachedChunks
 	}
 
+	builds := make([]*orchestrator.CachedBuildInfo, 0, len(buildMap))
+	for _, info := range buildMap {
+		builds = append(builds, info)
+	}
+
 	return &orchestrator.SandboxListCachedBuildsResponse{
 		Builds: builds,
 	}, nil

You can send follow-ups to the cloud agent here.

Comment thread packages/orchestrator/pkg/server/template_cache.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0402551cdf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/orchestrator/pkg/server/template_cache.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8779771c62

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/orchestrator/pkg/template/build/builder.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a28185d211

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/build/cache.go Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Create scheduling metadata often nil
    • Modified GetTemplate to use unresolved SetOnce instances and Fetch to load headers from storage, ensuring SchedulingMetadata returns actual header data instead of nil.

Create PR

Or push these changes by commenting:

@cursor push 74c45d8201
Preview (74c45d8201)
diff --git a/packages/orchestrator/pkg/sandbox/template/cache.go b/packages/orchestrator/pkg/sandbox/template/cache.go
--- a/packages/orchestrator/pkg/sandbox/template/cache.go
+++ b/packages/orchestrator/pkg/sandbox/template/cache.go
@@ -193,8 +193,8 @@
 	storageTemplate, err := newTemplateFromStorage(
 		c.config.BuilderConfig,
 		buildID,
-		resolvedHeader(nil),
-		resolvedHeader(nil),
+		utils.NewSetOnce[*header.Header](),
+		utils.NewSetOnce[*header.Header](),
 		persistence,
 		c.blockMetrics,
 		nil,

diff --git a/packages/orchestrator/pkg/sandbox/template/storage_template.go b/packages/orchestrator/pkg/sandbox/template/storage_template.go
--- a/packages/orchestrator/pkg/sandbox/template/storage_template.go
+++ b/packages/orchestrator/pkg/sandbox/template/storage_template.go
@@ -176,14 +176,28 @@
 	})
 
 	wg.Go(func() error {
-		memHdr, hdrErr := t.memfileHeader.WaitWithContext(ctx)
+		memHdr, hdrErr := t.memfileHeader.Result()
 		if hdrErr != nil {
-			errMsg := fmt.Errorf("failed to resolve memfile header: %w", hdrErr)
-			if err := t.memfile.SetError(errMsg); err != nil {
-				return fmt.Errorf("failed to set memfile error: %w", errors.Join(errMsg, err))
+			if _, notSet := hdrErr.(utils.NotSetError); notSet {
+				// Header not set yet, load it from storage
+				loadedHdr, loadErr := header.LoadHeader(ctx, t.persistence, t.paths.MemfileHeader())
+				if loadErr != nil && !errors.Is(loadErr, storage.ErrObjectNotExist) {
+					errMsg := fmt.Errorf("failed to load memfile header: %w", loadErr)
+					if err := t.memfile.SetError(errMsg); err != nil {
+						return fmt.Errorf("failed to set memfile error: %w", errors.Join(errMsg, err))
+					}
+					return nil
+				}
+				// Set the header (may be nil if not found)
+				_ = t.memfileHeader.SetValue(loadedHdr)
+				memHdr = loadedHdr
+			} else {
+				errMsg := fmt.Errorf("failed to resolve memfile header: %w", hdrErr)
+				if err := t.memfile.SetError(errMsg); err != nil {
+					return fmt.Errorf("failed to set memfile error: %w", errors.Join(errMsg, err))
+				}
+				return nil
 			}
-
-			return nil
 		}
 
 		memfileStorage, memfileErr := NewStorage(
@@ -214,14 +228,28 @@
 	})
 
 	wg.Go(func() error {
-		rootHdr, hdrErr := t.rootfsHeader.WaitWithContext(ctx)
+		rootHdr, hdrErr := t.rootfsHeader.Result()
 		if hdrErr != nil {
-			errMsg := fmt.Errorf("failed to resolve rootfs header: %w", hdrErr)
-			if err := t.rootfs.SetError(errMsg); err != nil {
-				return fmt.Errorf("failed to set rootfs error: %w", errors.Join(errMsg, err))
+			if _, notSet := hdrErr.(utils.NotSetError); notSet {
+				// Header not set yet, load it from storage
+				loadedHdr, loadErr := header.LoadHeader(ctx, t.persistence, t.paths.RootfsHeader())
+				if loadErr != nil && !errors.Is(loadErr, storage.ErrObjectNotExist) {
+					errMsg := fmt.Errorf("failed to load rootfs header: %w", loadErr)
+					if err := t.rootfs.SetError(errMsg); err != nil {
+						return fmt.Errorf("failed to set rootfs error: %w", errors.Join(errMsg, err))
+					}
+					return nil
+				}
+				// Set the header (may be nil if not found)
+				_ = t.rootfsHeader.SetValue(loadedHdr)
+				rootHdr = loadedHdr
+			} else {
+				errMsg := fmt.Errorf("failed to resolve rootfs header: %w", hdrErr)
+				if err := t.rootfs.SetError(errMsg); err != nil {
+					return fmt.Errorf("failed to set rootfs error: %w", errors.Join(errMsg, err))
+				}
+				return nil
 			}
-
-			return nil
 		}
 
 		rootfsStorage, rootfsErr := NewStorage(

You can send follow-ups to the cloud agent here.

Comment thread packages/orchestrator/pkg/server/sandboxes.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7abe525eae

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/orchestrator/pkg/server/template_cache.go Outdated
@ValentaTomas ValentaTomas requested a review from dobrac June 1, 2026 02:26
@ValentaTomas ValentaTomas marked this pull request as draft June 1, 2026 02:32
Derive scheduling metadata from resolved device/diff headers so pause,
create, cached-build listing and template builds report the actual build,
its own diff contributions, and the on-disk generation. Report logical vs
on-disk byte stats and merge template cache entries into ListCachedBuilds.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Uninitialized storage diff stats panic
    • Added nil check in StorageDiff.FileSize to return error instead of dereferencing nil chunker when called on uninitialized diffs.
  • ✅ Fixed: Pause hides header wait failures
    • Changed NewSnapshotSchedulingMetadata to return error when header wait fails, causing pause/checkpoint operations to fail instead of silently omitting metadata.

Create PR

Or push these changes by commenting:

@cursor push 5691fa6c77
Preview (5691fa6c77)
diff --git a/packages/orchestrator/pkg/sandbox/build/storage_diff.go b/packages/orchestrator/pkg/sandbox/build/storage_diff.go
--- a/packages/orchestrator/pkg/sandbox/build/storage_diff.go
+++ b/packages/orchestrator/pkg/sandbox/build/storage_diff.go
@@ -128,6 +128,10 @@
 }
 
 func (b *StorageDiff) FileSize(ctx context.Context) (int64, error) {
+	if b.chunker == nil {
+		return 0, fmt.Errorf("chunker not initialized")
+	}
+
 	return b.chunker.FileSize(ctx)
 }
 

diff --git a/packages/orchestrator/pkg/sandbox/sandbox.go b/packages/orchestrator/pkg/sandbox/sandbox.go
--- a/packages/orchestrator/pkg/sandbox/sandbox.go
+++ b/packages/orchestrator/pkg/sandbox/sandbox.go
@@ -1186,7 +1186,10 @@
 	cleanup.AddNoContext(ctx, rootfsDiff.Close)
 
 	rootfsDiffHeader := NewResolvedDiffHeader(rootfsHeader)
-	schedulingMetadata := NewSnapshotSchedulingMetadata(ctx, memfileDiffHeader, rootfsDiffHeader)
+	schedulingMetadata, err := NewSnapshotSchedulingMetadata(ctx, memfileDiffHeader, rootfsDiffHeader)
+	if err != nil {
+		return nil, fmt.Errorf("failed to resolve scheduling metadata: %w", err)
+	}
 
 	metadataFileLink := template.NewLocalFileLink(cachePaths.CacheMetadata())
 	cleanup.AddNoContext(ctx, metadataFileLink.Close)

diff --git a/packages/orchestrator/pkg/sandbox/snapshot.go b/packages/orchestrator/pkg/sandbox/snapshot.go
--- a/packages/orchestrator/pkg/sandbox/snapshot.go
+++ b/packages/orchestrator/pkg/sandbox/snapshot.go
@@ -59,12 +59,16 @@
 
 // NewSnapshotSchedulingMetadata derives the metadata from the new snapshot's
 // diff headers so the freshly created build and its contributions are included.
-func NewSnapshotSchedulingMetadata(ctx context.Context, memfileDiffHeader, rootfsDiffHeader *DiffHeader) *orchestrator.SchedulingMetadata {
+func NewSnapshotSchedulingMetadata(ctx context.Context, memfileDiffHeader, rootfsDiffHeader *DiffHeader) (*orchestrator.SchedulingMetadata, error) {
 	memfileHeader, memfileErr := memfileDiffHeader.WaitWithContext(ctx)
+	if memfileErr != nil {
+		return nil, fmt.Errorf("memfile header wait failed: %w", memfileErr)
+	}
+
 	rootfsHeader, rootfsErr := rootfsDiffHeader.WaitWithContext(ctx)
-	if memfileErr != nil || rootfsErr != nil {
-		return nil
+	if rootfsErr != nil {
+		return nil, fmt.Errorf("rootfs header wait failed: %w", rootfsErr)
 	}
 
-	return scheduling.FromHeaders(memfileHeader, rootfsHeader)
+	return scheduling.FromHeaders(memfileHeader, rootfsHeader), nil
 }

You can send follow-ups to the cloud agent here.

Comment thread packages/orchestrator/pkg/sandbox/build/cache.go
Comment thread packages/orchestrator/pkg/sandbox/snapshot.go
…adata

FileSize panicked on diffs cached before Init (nil chunker), matching the
nil guards already on Size/IsCached/Close. NewSnapshotSchedulingMetadata now
warns when a diff header fails to resolve instead of silently returning nil.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues. You can view the agent here.

Reviewed by Cursor Bugbot for commit d48f099. Configure here.

return nil
}

return scheduling.FromHeaders(memfileHeader, rootfsHeader)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pause waits on memfile header

Medium Severity

NewSnapshotSchedulingMetadata blocks on memfileDiffHeader.WaitWithContext during snapshot creation, so Pause/Checkpoint cannot finish until memfile dedup/header work completes. That reverses the async memfile header path meant to let pause return before compare finishes, lengthening the RPC critical path when dedup is enabled.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d48f099. Configure here.

})
if len(chain) > chainLimit {
chain = chain[:chainLimit]
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chain cap is thirty two

Medium Severity

Build chain truncation uses chainLimit 32, while the intended hard maximum discussed for this change is 128. Placement can drop lineage entries earlier than the agreed cap, skewing scheduling metadata for deep build chains.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d48f099. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants